Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data.
نویسندگان
چکیده
Large datasets describing the quantitative effects of mutations on protein function are becoming increasingly available. Here, we leverage these datasets to develop Envision, which predicts the magnitude of a missense variant's molecular effect. Envision combines 21,026 variant effect measurements from nine large-scale experimental mutagenesis datasets, a hitherto untapped training resource, with a supervised, stochastic gradient boosting learning algorithm. Envision outperforms other missense variant effect predictors both on large-scale mutagenesis data and on an independent test dataset comprising 2,312 TP53 variants whose effects were measured using a low-throughput approach. This dataset was never used for hyperparameter tuning or model training and thus serves as an independent validation set. Envision prediction accuracy is also more consistent across amino acids than other predictors. Finally, we demonstrate that Envision's performance improves as more large-scale mutagenesis data are incorporated. We precompute Envision predictions for every possible single amino acid variant in human, mouse, frog, zebrafish, fruit fly, worm, and yeast proteomes (https://envision.gs.washington.edu/).
منابع مشابه
Hfq variant with altered RNA binding functions
The interaction between Hfq and RNA is central to multiple regulatory processes. Using site-directed mutagenesis, we have found a missense mutation in Hfq (V43R) which strongly affects2 the RNA binding capacity of the Hfq protein and its ability to stimulate poly(A) tail elongation by poly(A)-polymerase in vitro. In vivo, overexpression of this Hfq variant fails to stimulate rpoS-lacZ expressio...
متن کاملTowards Increasing the Clinical Relevance of In Silico Methods to Predict Pathogenic Missense Variants
As genetic sequencing throughput continues to accelerate, so does the accumulation of variants of unknown clinical significance. The great majority of these variants cause amino acid substitutions (cSNVs) in protein sequence. The need to interpret these variants continues to motivate development of better in silico bioinformatic methods. Despite the development of dozens of such methods over th...
متن کاملBRCA1 and BRCA2 Missense Variants of High and Low Clinical Significance Influence Lymphoblastoid Cell Line Post-Irradiation Gene Expression
The functional consequences of missense variants in disease genes are difficult to predict. We assessed if gene expression profiles could distinguish between BRCA1 or BRCA2 pathogenic truncating and missense mutation carriers and familial breast cancer cases whose disease was not attributable to BRCA1 or BRCA2 mutations (BRCAX cases). 72 cell lines from affected women in high-risk breast ovaria...
متن کاملRapid functional analysis of computationally complex rare human IRF6 gene variants using a novel zebrafish model
Large-scale sequencing efforts have captured a rapidly growing catalogue of genetic variations. However, the accurate establishment of gene variant pathogenicity remains a central challenge in translating personal genomics information to clinical decisions. Interferon Regulatory Factor 6 (IRF6) gene variants are significant genetic contributors to orofacial clefts. Although approximately three ...
متن کاملEstimation of probabilities in favour of pathogenicity for missense substitutions for use in clinical evaluation of mismatch repair gene variants
A considerable proportion of Lynch syndrome families present with mismatch repair (MMR) gene sequence variants of uncertain clinical significance, which constitute a challenge in both the research and clinical settings. Such unclassified variants (UVs) include rare nucleotide changes predicted to cause missense substitutions, small in-frame deletions, or possible alterations in splicing. We are...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Cell systems
دوره 6 1 شماره
صفحات -
تاریخ انتشار 2018